Deliverable 4: Neural Networks for puzzle reassembly

The previous deliverable established that when it comes to documents, due to uniform distribution of contrast and lack of color, the clustering algorithm does not perform to satisfactory levels. One of the main reasons for this was the metric for piece distance calculation which was the Achilles heel of the placer algorithm. Hence, at this point, the following option was considered:

A neural network model utilizing LSTM for puzzle reassembly

Long Short Term Memory (LSTM) is an architecture of Recurrent Neural Networks (RNN) which uses cells determining how much of the past information is to be carried forward in the time steps forward. LSTM is well suited for puzzle reassembly because it is essentially predicting the next object in the sequence. Puzzle assembly is predicting the best possible next piece. The model focusses on building a sequence of pieces. A sequence to label model works better than feed-forward model like CNN. LSTM can be used to model sequences with recurrent connections. It is found to be more accurate than CNN. The learning rate of the earlier layers were very less and hence the earlier convolution filters faced a difficult time finding out the edges in the jumbled up images.

Vectors for each of the component images (x1, x2, x3, ..., xn) are taken as different time steps of the LSTM.
It is a many-to-one-model.
Output is a one-hot representation of the permutation that generates the best possible shred sequence.
The different image parts at the different time steps allowed the model to recognize the different constituents of the image.

Softmax loss is used. If the total number of pieces is 4, there are 4! = 24 possible permutations, one for each class. While testing, the softmax output of 24 classes is checked for max value. The element with the highest value is chosen as the final permutation.